Optimal learning for sequential sampling with non-parametric beliefs
نویسندگان
چکیده
We propose a sequential learning policy for ranking and selection problems, where we use a non-parametric procedure for estimating the value of a policy. Our estimation approach aggregates over a set of kernel functions in order to achieve a more consistent estimator. Each element in the kernel estimation set uses a di erent bandwidth to achieve better aggregation. The nal estimate uses a weighting scheme with the inverse mean square errors of the kernel estimators as weights. This weighting scheme is shown to be optimal under independent kernel estimators. For choosing the measurement, we employ the knowledge gradient method, a myopic policy that relies on predictive distributions to calculate the optimal sampling point. Our method allows a setting where the beliefs are expected to be correlated but the correlation structure is unknown beforehand. This is an extension of the known knowledge gradient with correlated beliefs. Moreover, the proposed policy is asymptotically optimal.
منابع مشابه
Learning for non-stationary Dirichlet processes
The Dirichlet process prior (DPP) is used to model an unknown probability distribution, F : This eliminates the need for parametric model assumptions, providing robustness in problems where there is significant model uncertainty. Two important parametric techniques for learning are extended to this non-parametric context for the first time. These are (i) sequential stopping, which proposes an o...
متن کاملReinforcement Learning and Design of Nonparametric Sequential Decision Networks
In this paper we discuss the design of sequential detection networks for nonparametric sequential analysis. We present a general probabilistic model for sequential detection problems where the sample size as well as the statistics of the sample can be varied. A general sequential detection network handles three decisions. First, the network decides whether to continue sampling or stop and make ...
متن کاملOptimal Filtering for Non-parametric Observation Models: Applications to Localization and SLAM
In this work we address the problem of optimal Bayesian filtering for dynamic systems with observation models that cannot be approximated properly as any parameterized distribution. In the context of mobile robots this problem arises in localization and simultaneous localization and mapping (SLAM) with occupancy grid maps. The lack of a parameterized observation model for these maps forces a sa...
متن کاملToward negotiable reinforcement learning: shifting priorities in Pareto optimal sequential decision-making
Existing multi-objective reinforcement learning (MORL) algorithms do not account for objectives that arise from players with differing beliefs. Concretely, consider two players with different beliefs and utility functions who may cooperate to build a machine that takes actions on their behalf. A representation is needed for how much the machine’s policy will prioritize each player’s interests o...
متن کاملLearning and Optimization for Sequential Decision Making 02 / 01 / 16 Lecture 4 : Thompson Sampling ( part 1 )
Consider the problem of learning a parametric distribution from observations. A frequentist approach to learning considers parameters to be fixed, and uses the data learn those parameters as accurately as possible. For example, consider the problem of learning Bernoulli distribution’s parameter ( a random variable is distributed as Bernoulli(μ) is 1 with probability μ and 0 with probability 1 −...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید
ثبت ناماگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید
ورودعنوان ژورنال:
- J. Global Optimization
دوره 58 شماره
صفحات -
تاریخ انتشار 2014